AITopics | robot learning

Collaborating Authors

robot learning

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Point Cloud Matters: Rethinking the Impact of Different Observation Spaces on Robot Learning

Neural Information Processing SystemsMar-21-2026, 13:49:35 GMT

In robot learning, the observation space is crucial due to the distinct characteristics of different modalities, which can potentially become a bottleneck alongside policy design. In this study, we explore the influence of various observation spaces on robot learning, focusing on three predominant modalities: RGB, RGB-D, and point cloud. We introduce OBSBench, a benchmark comprising two simulators and 125 tasks, along with standardized pipelines for various encoders and policy baselines. Extensive experiments on diverse contact-rich manipulation tasks reveal a notable trend: point cloud-based methods, even those with the simplest designs, frequently outperform their RGB and RGB-D counterparts. This trend persists in both scenarios: training from scratch and utilizing pre-training. Furthermore, our findings demonstrate that point cloud observations often yield better policy performance and significantly stronger generalization capabilities across various geometric and visual conditions. These outcomes suggest that the 3D point cloud is a valuable observation modality for intricate robotic tasks. We also suggest that incorporating both appearance and coordinate information can enhance the performance of point cloud methods. We hope our work provides valuable insights and guidance for designing more generalizable and robust robotic models.

artificial intelligence, machine learning, proceedings, (6 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.97)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

8a84a4341c375b8441b36836bb343d4e-Paper-Datasets_and_Benchmarks.pdf

Neural Information Processing SystemsFeb-15-2026, 18:08:38 GMT

machine learning, natural language, reinforcement learning, (15 more...)

Neural Information Processing Systems

Country: North America > United States > Washington > King County > Seattle (0.04)

Genre: Research Report (0.93)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.70)

Add feedback

RoboHive: A Unified Framework for Robot Learning

Neural Information Processing SystemsDec-26-2025, 07:22:35 GMT

Our platform encompasses a diverse range of pre-existing and novel environments, including dexterous manipulation with the Shadow Hand, whole-arm manipulation tasks with Franka and Fetch robots, quadruped locomotion, among others. Included environments are organized within and cover multiple domains such as hand manipulation, locomotion, multi-task, multi-agent, muscles, etc. In comparison to prior works, RoboHive offers a streamlined and unified task interface taking dependency on only a minimal set of well-maintained packages, features tasks with high physics fidelity and rich visual diversity, and supports common hardware drivers for real-world deployment. The unified interface of RoboHive offers a convenient and accessible abstraction for algorithmic research in imitation, reinforcement, multi-task, and hierarchical learning. Furthermore, RoboHive includes expert demonstrations and baseline results for most environments, providing a standard for benchmarking and comparisons.

robohive, robot learning, unified framework, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Robots (1.00)

Add feedback

Dribble Master: Learning Agile Humanoid Dribbling Through Legged Locomotion

Wang, Zhuoheng, Zhou, Jinyin, Wu, Qi

arXiv.org Artificial IntelligenceDec-9-2025

Humanoid soccer dribbling is a highly challenging task that demands dexterous ball manipulation while maintaining dynamic balance. Traditional rule-based methods often struggle to achieve accurate ball control due to their reliance on fixed walking patterns and limited adaptability to real-time ball dynamics. To address these challenges, we propose a two-stage curriculum learning framework that enables a humanoid robot to acquire dribbling skills without explicit dynamics or predefined trajectories. In the first stage, the robot learns basic locomotion skills; in the second stage, we fine-tune the policy for agile dribbling maneuvers. We further introduce a virtual camera model in simulation that simulates the field of view and perception constraints of the real robot, enabling realistic ball perception during training. We also design heuristic rewards to encourage active sensing, promoting a broader visual range for continuous ball perception. The policy is trained in simulation and successfully transferred to a physical humanoid robot. Experiment results demonstrate that our method enables effective ball manipulation, achieving flexible and visually appealing dribbling behaviors across multiple environments. This work highlights the potential of reinforcement learning in developing agile humanoid soccer robots. Additional details and videos are available at https://zhuoheng0910.github.io/dribble-master/.

artificial intelligence, humanoid robot, robot, (14 more...)

arXiv.org Artificial Intelligence

2505.12679

Genre: Research Report > New Finding (0.66)

Industry: Leisure & Entertainment > Sports > Soccer (1.00)

Technology: Information Technology > Artificial Intelligence > Robots > Locomotion (1.00)

Add feedback

SIMPACT: Simulation-Enabled Action Planning using Vision-Language Models

Liu, Haowen, Yao, Shaoxiong, Chen, Haonan, Gao, Jiawei, Mao, Jiayuan, Huang, Jia-Bin, Du, Yilun

arXiv.org Artificial IntelligenceDec-8-2025

Vision-Language Models (VLMs) exhibit remarkable common-sense and semantic reasoning capabilities. However, they lack a grounded understanding of physical dynamics. This limitation arises from training VLMs on static internet-scale visual-language data that contain no causal interactions or action-conditioned changes. Consequently, it remains challenging to leverage VLMs for fine-grained robotic manipulation tasks that require physical understanding, reasoning, and corresponding action planning. To overcome this, we present SIMPACT, a test-time, SIMulation-enabled ACTion Planning framework that equips VLMs with physical reasoning through simulation-in-the-loop world modeling, without requiring any additional training. From a single RGB-D observation, SIMPACT efficiently constructs physics simulations, enabling the VLM to propose informed actions, observe simulated rollouts, and iteratively refine its reasoning. By integrating language reasoning with physics prediction, our simulation-enabled VLM can understand contact dynamics and action outcomes in a physically grounded way. Our method demonstrates state-of-the-art performance on five challenging, real-world rigid-body and deformable manipulation tasks that require fine-grained physical reasoning, outperforming existing general-purpose robotic manipulation models. Our results demonstrate that embedding physics understanding via efficient simulation into VLM reasoning at test time offers a promising path towards generalizable embodied intelligence. Project webpage can be found at https://simpact-bot.github.io

large language model, machine learning, vlm, (17 more...)

arXiv.org Artificial Intelligence

2512.05955

Country: North America (0.28)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Robots > Robot Planning & Action (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Bootstrap Dynamic-Aware 3D Visual Representation for Scalable Robot Learning

Liang, Qiwei, Cai, Boyang, Lai, Minghao, Zhuang, Sitong, Lin, Tao, Qin, Yan, Ye, Yixuan, Liang, Jiaming, Xu, Renjing

arXiv.org Artificial IntelligenceDec-5-2025

Despite strong results on recognition and segmentation, current 3D visual pre-training methods often underperform on robotic manipulation. We attribute this gap to two factors: the lack of state-action-state dynamics modeling and the unnecessary redundancy of explicit geometric reconstruction. We introduce AFRO, a self-supervised framework that learns dynamics-aware 3D representations without action or reconstruction supervision. AFRO casts state prediction as a generative diffusion process and jointly models forward and inverse dynamics in a shared latent space to capture causal transition structure. To prevent feature leakage in action learning, we employ feature differencing and inverse-consistency supervision, improving the quality and stability of visual features. When combined with Diffusion Policy, AFRO substantially increases manipulation success rates across 16 simulated and 4 real-world tasks, outperforming existing pre-training approaches. The framework also scales favorably with data volume and task complexity. Qualitative visualizations indicate that AFRO learns semantically rich, discriminative features, offering an effective pre-training solution for 3D representation learning in robotics. Project page: https://kolakivy.github.io/AFRO/

artificial intelligence, machine learning, representation, (15 more...)

arXiv.org Artificial Intelligence

2512.00074

Country: Asia > China (0.67)

Genre: Research Report (0.82)

Industry: Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

MP1: MeanFlow Tames Policy Learning in 1-step for Robotic Manipulation

Sheng, Juyi, Wang, Ziyi, Li, Peiming, Liu, Mengyuan

arXiv.org Artificial IntelligenceDec-4-2025

In robot manipulation, robot learning has become a prevailing approach. However, generative models within this field face a fundamental trade-off between the slow, iterative sampling of diffusion models and the architectural constraints of faster Flow-based methods, which often rely on explicit consistency losses. To address these limitations, we introduce MP1, which pairs 3D point-cloud inputs with the MeanFlow paradigm to generate action trajectories in one network function evaluation (1-NFE). By directly learning the interval-averaged velocity via the "MeanFlow Identity", our policy avoids any additional consistency constraints. This formulation eliminates numerical ODE-solver errors during inference, yielding more precise trajectories. MP1 further incorporates CFG for improved trajectory controllability while retaining 1-NFE inference without reintroducing structural constraints. Because subtle scene-context variations are critical for robot learning, especially in few-shot learning, we introduce a lightweight Dispersive Loss that repels state embeddings during training, boosting generalization without slowing inference. We validate our method on the Adroit and Meta-World benchmarks, as well as in real-world scenarios. Experimental results show MP1 achieves superior average task success rates, outperforming DP3 by 10.2% and FlowPolicy by 7.3%. Its average inference time is only 6.8 ms-19x faster than DP3 and nearly 2x faster than FlowPolicy. Our project page is available at https://mp1-2254.github.io/, and the code can be accessed at https://github.com/LogSSim/MP1.

artificial intelligence, flowpolicy, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2507.10543

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Robots > Manipulation (0.48)

Add feedback

IGen: Scalable Data Generation for Robot Learning from Open-World Images

Gu, Chenghao, Kang, Haolan, Lin, Junchao, Wang, Jinghe, Wu, Duo, Xie, Shuzhao, Huang, Fanding, Ge, Junchen, Gong, Ziyang, Li, Letian, Zheng, Hongying, Lv, Changwei, Wang, Zhi

arXiv.org Artificial IntelligenceDec-2-2025

The rise of generalist robotic policies has created an exponential demand for large-scale training data. However, on-robot data collection is labor-intensive and often limited to specific environments. In contrast, open-world images capture a vast diversity of real-world scenes that naturally align with robotic manipulation tasks, offering a promising avenue for low-cost, large-scale robot data acquisition. Despite this potential, the lack of associated robot actions hinders the practical use of open-world images for robot learning, leaving this rich visual resource largely unexploited. To bridge this gap, we propose IGen, a framework that scalably generates realistic visual observations and executable actions from open-world images. IGen first converts unstructured 2D pixels into structured 3D scene representations suitable for scene understanding and manipulation. It then leverages the reasoning capabilities of vision-language models to transform scene-specific task instructions into high-level plans and generate low-level actions as SE(3) end-effector pose sequences. From these poses, it synthesizes dynamic scene evolution and renders temporally coherent visual observations. Experiments validate the high quality of visuomotor data generated by IGen, and show that policies trained solely on IGen-synthesized data achieve performance comparable to those trained on real-world data. This highlights the potential of IGen to support scalable data generation from open-world images for generalist robotic policy training.

arxiv preprint arxiv, large language model, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2512.01773

Country: Asia > China (0.28)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Robots > Robot Planning & Action (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.46)

Add feedback

ArtiBench and ArtiBrain: Benchmarking Generalizable Vision-Language Articulated Object Manipulation

Wu, Yuhan, Wei, Tiantian, Wang, Shuo, Wang, ZhiChao, Zhang, Yanyong, Cremers, Daniel, Xia, Yan

arXiv.org Artificial IntelligenceDec-1-2025

Interactive articulated manipulation requires long-horizon, multi-step interactions with appliances while maintaining physical consistency. Existing vision-language and diffusion-based policies struggle to generalize across parts, instances, and categories. We first introduce ArtiBench, a five-level benchmark covering kitchen, storage, office, and tool environments. ArtiBench enables structured evaluation from cross-part and cross-instance variation to long-horizon multi-object tasks, revealing the core generalization challenges of articulated object manipulation. Building on this benchmark, we propose ArtiBrain, a modular framework that unifies high-level reasoning with adaptive low-level control. ArtiBrain uses a VLM-based Task Reasoner (GPT-4.1) to decompose and validate subgoals, and employs a Hybrid Controller that combines geometry-aware keyframe execution with affordance-guided diffusion for precise and interpretable manipulation. An Affordance Memory Bank continually accumulates successful execution episodes and propagates part-level actionable affordances to unseen articulated parts and configurations. Extensive experiments on ArtiBench show that our ArtiBrain significantly outperforms state-of-the-art multimodal and diffusion-based methods in robustness and generalization. Code and dataset will be released upon acceptance.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2511.2033

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

NOIR 2.0: Neural Signal Operated Intelligent Robots for Everyday Activities

Kim, Tasha, Wang, Yingke, Cho, Hanvit, Hodges, Alex

arXiv.org Artificial IntelligenceNov-27-2025

Brain-robot interfaces (BRIs) represent a major milestone in the fields of art, science, and engineering. The Neural Signal Operated Intelligent Robots (NOIR) [1], unveiled in 2023, is a versatile, intelligent BRI system that employs non-invasive electroencephalography (EEG). The system operates on the concept of hierarchical shared autonomy, where humans set high-level objectives, and the robot carries out these objectives through the execution of detailed motor commands. At the time of its introduction, NOIR demonstrated its general-purpose nature by being able to handle a variety of tasks (20 everyday activities) and showing broad accessibility, as it requires minimal training for use by the general public. Moreover, NOIR is adaptive and intelligent, equipped with a broad set of skills that enable it to autonomously perform low-level actions. Human intentions are conveyed, interpreted, and executed by the robots through parameterized primitive skills, such as Pick(obj-A) or MoveTo(x,y).

artificial intelligence, noir 2, robot, (11 more...)

arXiv.org Artificial Intelligence

2511.20848

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.94)

Technology: Information Technology > Artificial Intelligence > Robots (1.00)

Add feedback